approximate inference turn deep network
Approximate Inference Turns Deep Networks into Gaussian Processes
Deep neural networks (DNN) and Gaussian processes (GP) are two powerful models with several theoretical connections relating them, but the relationship between their training methods is not well understood. In this paper, we show that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors. This enables us to relate solutions and iterations of a deep-learning algorithm to GP inference. As a result, we can obtain a GP kernel and a nonlinear feature map while training a DNN. Surprisingly, the resulting kernel is the neural tangent kernel. We show kernels obtained on real datasets and demonstrate the use of the GP marginal likelihood to tune hyperparameters of DNNs. Our work aims to facilitate further research on combining DNNs and GPs in practical settings.
Approximate Inference Turns Deep Networks into Gaussian Processes
Deep neural networks (DNN) and Gaussian processes (GP) are two powerful models with several theoretical connections relating them, but the relationship between their training methods is not well understood. In this paper, we show that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors. This enables us to relate solutions and iterations of a deep-learning algorithm to GP inference. As a result, we can obtain a GP kernel and a nonlinear feature map while training a DNN. Surprisingly, the resulting kernel is the neural tangent kernel.
Reviews: Approximate Inference Turns Deep Networks into Gaussian Processes
There's some space to improve for the experiments. I think the main contribution of this paper is proposing a method to transform the complicated neural network structure to a nonlinear feature mapping function, so that they can linearly separate the weight and feature mapping. Given the feature mapping, kernels/correlations and posterior distributions over output functions can be explicitly built for BNN (or DNN). Therefore, I would expect to see 1. What does this feature mapping look like? I think the authors show the kernel instead of the mapping itself.
Reviews: Approximate Inference Turns Deep Networks into Gaussian Processes
This paper demonstrates theoretically that multiple forms of approximate Bayesian inference (Laplace approximation and variational inference) for deep neural networks are equivalent to Gaussian processes. The authors formalize this connection and write out the GP covariance function corresponding to these networks, which surprisingly turns out to be the neural tangent kernel. The authors also establish a connection to the training procedure of the neural network and GPs, which is a novel contribution. There is a growing literature on the connection between neural networks and Gaussian processes, with a variety of papers establishing the connection in the infinite limit of hidden units. This paper adds nicely to that literature, developing a connection to approximate Bayesian inference.
Approximate Inference Turns Deep Networks into Gaussian Processes
Deep neural networks (DNN) and Gaussian processes (GP) are two powerful models with several theoretical connections relating them, but the relationship between their training methods is not well understood. In this paper, we show that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors. This enables us to relate solutions and iterations of a deep-learning algorithm to GP inference. As a result, we can obtain a GP kernel and a nonlinear feature map while training a DNN. Surprisingly, the resulting kernel is the neural tangent kernel.
Approximate Inference Turns Deep Networks into Gaussian Processes
Khan, Mohammad Emtiyaz E., Immer, Alexander, Abedi, Ehsan, Korzepa, Maciej
Deep neural networks (DNN) and Gaussian processes (GP) are two powerful models with several theoretical connections relating them, but the relationship between their training methods is not well understood. In this paper, we show that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors. This enables us to relate solutions and iterations of a deep-learning algorithm to GP inference. As a result, we can obtain a GP kernel and a nonlinear feature map while training a DNN. Surprisingly, the resulting kernel is the neural tangent kernel.